feat: surface gate detail in the workflow run/resume --json payload by doquanghuy · Pull Request #2965 · github/spec-kit

doquanghuy · 2026-06-12T17:37:44Z

Description

Reference implementation for #2964 — for discussion, direction welcome.

When a run pauses at a gate, the --json outcome now carries a gate block (step_id / message / options / choice) so orchestrators can detect "human review needed" and present the options without parsing the human-facing stream. Two small pieces:

The engine records each step's type in the run state's step results (one added line in step_data — previously the type was not recoverable from state).
_workflow_run_payload adds the gate block via a _gate_outcome helper when the run's current step is a gate. choice populates when the outcome ends at the gate with a decision recorded (e.g. an interactive rejection with on_reject: abort → a failed payload carrying "choice": "reject"; an on_reject: retry pause likewise). A mid-flow approval proceeds past the gate, so the block clears — by design. Non-gate runs and runs that end elsewhere are unchanged — no gate key, payload byte-identical to today.

The issue lists alternatives (a generic paused_step block; a dedicated status value) — happy to rework toward either.

Testing

Ran existing tests with uv sync && uv run pytest — full suite 3727 passed
Two new CLI-level tests (TestWorkflowRunGateOutcomeJson): a gate pause carries the exact block (CliRunner stdin is non-TTY, so the gate pauses); a completed run has no gate key — the gate-pause test is red against current main, green with the change (verified both directions)
uvx ruff check src/ — clean
Tested locally with uv run specify --help
Tested with a sample project (covered by the CLI-level tests, which drive a real gate workflow through workflow run --json)

AI Disclosure

I did not use AI assistance for this contribution
I did use AI assistance (describe below)

Code, tests, and this description were authored with AI assistance (Claude); verified by running the repo's test suite and ruff locally in both red and green directions.

doquanghuy · 2026-06-12T17:38:44Z

@mnriem when you have a moment, would appreciate your thoughts on the direction here — the issue lists the alternatives considered, and I'm happy to rework toward whichever shape fits Spec Kit best.

Copilot

Pull request overview

This PR extends the workflow CLI’s --json run/resume outcome payload to include structured details when the run is paused at a gate step, enabling external orchestrators to detect “human review needed” without parsing stdout.

Changes:

Record each executed step’s type into persisted step_results so step types are recoverable from run state.
Add an optional gate block to the workflow run --json / workflow resume --json payload when the current step is a gate.
Add CLI-level tests covering a non-interactive gate pause (includes gate block) and a non-gate completed run (no gate key).

Show a summary per file

File	Description
`tests/test_workflows.py`	Adds CLI-level tests asserting `--json` includes a structured `gate` block on gate pauses and omits it for a normal completed run.
`src/specify_cli/workflows/engine.py`	Persists `type` in each step’s recorded `step_results` entry so step-type introspection is possible from run state.
`src/specify_cli/__init__.py`	Builds the `--json` outcome payload and conditionally injects `gate` details via a helper when the current step is a gate.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 3/3 changed files
Comments generated: 2

mnriem · 2026-06-16T20:37:12Z

Please address Copilot feedback

Address review (github#2965): _gate_outcome() emitted a gate block whenever current_step_id pointed at a gate step. Since RunState.current_step_id is never cleared on completion, a completed/failed run whose last step was a gate leaked stale gate detail in run/resume/status --json. Guard on status == paused. Also assert CLI success in the _run_json test helper before JSON-parsing, and add direct coverage for the suppression guard. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

doquanghuy · 2026-06-17T02:16:49Z

@mnriem Thanks for the review — addressed the Copilot feedback:

_gate_outcome() now only surfaces the gate block while the run is actually paused (guards on status == paused). Since RunState.current_step_id isn't cleared on completion, a completed/failed run whose last step was a gate no longer leaks stale gate detail in run/resume/status --json.
Hardened the _run_json test helper to assert CLI success before JSON-parsing, and added direct coverage for the suppression guard.

Full suite green; ruff check src/ clean. Ready for another look.

Copilot

Copilot's findings

Files reviewed: 3/3 changed files
Comments generated: 3

doquanghuy · 2026-06-17T13:46:11Z

@mnriem Pushed 5fd0f85 addressing the latest Copilot round:

Abort path now surfaces the gate block. _gate_outcome() emits the gate detail for aborted runs too, not only paused. Abort is the only path that sets ABORTED (gate rejection with on_reject: abort) and it leaves current_step_id on that gate, so an orchestrator can read the recorded choice for the stop. completed/failed stay suppressed.
Stable JSON schema. message is coerced to a string — GateStep only coerces it for expression interpolation, so a non-string YAML literal could otherwise leak into the payload.
Tests: added a CLI-level aborted-path test (test_gate_abort_carries_gate_block, asserts status == aborted and choice == reject), a message-coercion test, and extended the suppression test to allow aborted. Shared the run helper via _invoke_json to avoid duplicating invoke boilerplate.

Copilot

Copilot's findings

Files reviewed: 3/3 changed files
Comments generated: 1

mnriem · 2026-06-17T15:22:32Z

Please address Copilot feedback and rebase on upstream/main

doquanghuy · 2026-06-17T15:50:26Z

@mnriem Pushed 5f60408 for the latest Copilot round:

The gate-abort test parsed stdout without first asserting the CLI exited cleanly, so an invoke failure would have surfaced as an opaque JSON decode error. It now routes through _run_json (which asserts exit_code == 0 before parsing), and I dropped the now-redundant _invoke_json helper — a gate abort emits the payload and returns, so the run exits cleanly.

Full tests/test_workflows.py green (212 passed).

Copilot

Copilot's findings

Files reviewed: 3/3 changed files
Comments generated: 2

doquanghuy · 2026-06-17T17:40:06Z

@mnriem Pushed 3e303fb addressing the latest Copilot round:

Run-helper assertion message now uses result.output instead of result.stdout. Under --json, step output is redirected off stdout, so a failing run's useful diagnostics live on result.output; the JSON parse still reads stdout. This also brings _run_json in line with the other CLI tests in the file.
StepContext.steps docstring updated from the old 5-key entry shape to the canonical 7-key shape the engine actually persists (type, integration, model, options, input, output, status), so step authors and debuggers see the real record.

Full tests/test_workflows.py green (212 passed); ruff clean.

Heads up: GitHub now shows this branch as conflicting with main — that's from #2959 and #2963 having merged (they touch the same run-command / engine.py / test regions). I've left the branch as-is rather than rebase a public PR unprompted; happy to rebase onto current main and resolve if you'd like, just say the word.

A paused run was indistinguishable from any other pause in the machine-readable outcome, and the gate's prompt/options/choice never left the human-facing stream. Record each step's type in the run state's step results (one engine line) and, when the run sits at a gate, add a gate block (step_id/message/options/choice) to the payload so orchestrators can drive review gates without parsing stdout. Reference implementation for the proposal in github#2964. Addresses github#2964

Address review (github#2965): _gate_outcome() emitted a gate block whenever current_step_id pointed at a gate step. Since RunState.current_step_id is never cleared on completion, a completed/failed run whose last step was a gate leaked stale gate detail in run/resume/status --json. Guard on status == paused. Also assert CLI success in the _run_json test helper before JSON-parsing, and add direct coverage for the suppression guard. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Address Copilot review: - `_gate_outcome` now also surfaces the gate block when a run is `aborted` by a gate rejection (`on_reject: abort`), not only when `paused`. Abort is the only path that sets ABORTED and it leaves current_step_id on the gate, so an orchestrator can read the recorded `choice` for the stop. - Coerce `message` to a string (it may be a non-string YAML literal that GateStep only coerces for interpolation) so the JSON schema stays stable. - Tests: add a CLI-level aborted-path test, a message-coercion test, and extend the suppression test to allow `aborted`; share the run helper via `_invoke_json` to avoid duplicating the invoke boilerplate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Address Copilot review: the gate-abort test parsed stdout without first asserting the CLI exited cleanly, so an invoke failure would surface as an opaque JSON decode error. Route it through `_run_json` (which asserts exit_code == 0 before parsing) and drop the now-redundant `_invoke_json` helper — a gate abort emits the payload and returns, so the run exits 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Address Copilot review: - `_run_json` asserted with `result.stdout` in the message, but under `--json` step output is redirected off stdout — the useful diagnostics live on `result.output`. Switch the assertion message to `result.output` (the JSON parse still reads stdout), matching the other CLI tests. - `StepContext.steps` documented a 5-key entry shape; the engine now also persists `type` and `status`. Update the docstring to the canonical 7-key shape so step authors/debuggers see the real record. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

After rebasing onto main, a gate abort now emits the --json payload and then exits non-zero (`_run_outcome_exit_code` maps aborted → 1, from the merged exit-code work). Give `_run_json` an `expected_exit` parameter (default 0) so the abort case asserts exit 1 while the paused/completed cases stay at 0 — keeping a single shared helper rather than duplicating the invoke boilerplate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

doquanghuy · 2026-06-17T17:58:02Z

@mnriem Rebased onto current main and resolved the conflict (force-pushed 24d6e85). The branch is now MERGEABLE.

What the rebase touched:

Conflict was a clean "keep both": fix: non-zero exit code when a workflow run ends failed or aborted #2959 added TestWorkflowRunExitCodes exactly where this PR adds TestWorkflowRunGateOutcomeJson — both classes are preserved in full. _workflow_run_payload now (correctly) calls _gate_outcome and keeps _run_outcome_exit_code; the step_data["type"] field this PR relies on sits cleanly alongside the rest.
One real semantic interaction with fix: non-zero exit code when a workflow run ends failed or aborted #2959: a gate abort now emits the --json payload and then exits non-zero (aborted → 1). I gave the test helper an expected_exit param (default 0) so the abort test asserts exit 1 while paused/completed stay at 0 — single shared helper, no duplicated invoke boilerplate.

All prior Copilot fixes are intact (paused/aborted guard, message coercion, result.output assert message, 7-key StepContext.steps docstring). Full specify_cli test suite green locally (3879 passed); ruff clean.

Copilot

Copilot's findings

Files reviewed: 4/4 changed files
Comments generated: 1

Address Copilot review: - A run paused by an older version has no persisted step `type`, so `_gate_outcome` would never surface its gate block on resume. Add `_is_gate_step`: prefer the `type` field, but when it is absent fall back to the gate's unique output signature (`on_reject`, written only by GateStep). A record with a different known `type` is still not a gate. - Normalize `options` to a list of strings (mirroring the `message` coercion) so an unvalidated workflow with non-string options can't destabilize the JSON schema. - Tests: options coercion, type-less gate detection, and a type-less non-gate negative case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

doquanghuy · 2026-06-17T18:59:57Z

@mnriem Pushed d322cf5 for the latest Copilot comment:

Backward-compatible gate detection. A run paused by an older version has no persisted step type, so _gate_outcome would have dropped its gate block on resume. Added _is_gate_step: it prefers the type field, and when type is absent falls back to the gate's unique output signature (on_reject, which only GateStep writes). A record carrying a different known type is still not treated as a gate.
Options normalized to strings. Mirroring the existing message coercion, options is now normalized to a list of strings so an unvalidated workflow with non-string options can't destabilize the JSON schema.
Tests: options coercion, type-less gate detection (resume path), and a type-less non-gate negative case.

Full tests/test_workflows.py green (291 passed); ruff clean.

Copilot

Copilot's findings

Files reviewed: 4/4 changed files
Comments generated: 1

Address Copilot review: the prior options normalization only mapped a `list`, returning the raw value for any other shape (scalar/tuple), which contradicted the "stable list[str]" intent. Extract `_normalize_gate_options`: None stays None; list/tuple maps each element through str; any other scalar becomes a single-element list (a bare string is one option, never iterated character-by-character). The emitted schema is now always list[str] | None. Extend the options test to cover list, tuple, bare string, numeric scalar, and None. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

doquanghuy · 2026-06-17T19:25:04Z

@mnriem Pushed a97cd18 for the latest Copilot comment:

Non-list options now normalized too. The previous pass only mapped a list, returning the raw value for any other shape — so a scalar or tuple leaked through unnormalized, contradicting the list[str] intent. Extracted _normalize_gate_options: None stays None; a list/tuple maps each element through str; any other scalar becomes a single-element list (a bare string is treated as one option, never iterated character-by-character). The emitted schema is now always list[str] | None.
Tests extended to cover list, tuple, bare string, numeric scalar, and None.

Full tests/test_workflows.py green (291 passed); ruff clean.

Copilot

Copilot's findings

Files reviewed: 4/4 changed files
Comments generated: 2

+    output = step.get("output") or {}
+    # `message` and `options` may be non-string YAML literals in an unvalidated
+    # workflow (GateStep coerces neither for the payload), so normalise both
+    # here for a stable JSON schema: message → str, options → list[str] | None.
+    message = output.get("message")
+    return {
+        "step_id": state.current_step_id,
+        "message": None if message is None else str(message),
+        "options": _normalize_gate_options(output.get("options")),
+        "choice": output.get("choice"),
+    }


+steps:
+  - id: fine
+    type: shell
+    run: "true"


Address Copilot review: - `_gate_outcome` normalized `message` and `options` but passed `choice` through as-is; an unvalidated gate can record a non-string `choice`, which contradicts the stable-schema rationale. Coerce `choice` to `str | None` (None still means "no decision yet"), consistent with the other two fields. Adds a focused choice-coercion test. - The plain (no-gate) test workflow used `run: "true"`, which fails under cmd.exe on Windows (ShellStep uses shell=True). Use the cross-platform `run: "exit 0"` (matching the exit-code suite's workflows). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

doquanghuy · 2026-06-17T19:50:36Z

@mnriem Pushed c4297a0 for the latest Copilot round:

choice now normalized too. _gate_outcome normalized message and options but passed choice through as-is — an unvalidated gate can record a non-string choice, contradicting the stable-schema rationale. It is now coerced to str | None (None still means "no decision yet"), consistent with the other two fields. Added a focused choice-coercion test (None / string / non-string).
Portable plain-gate test. The no-gate test workflow used run: "true", which fails under cmd.exe on Windows (ShellStep uses shell=True). Switched to the cross-platform run: "exit 0", matching the exit-code suite's workflows. (Scoped to the one occurrence this PR introduced; two pre-existing run: "true" lines elsewhere are upstream and left untouched to keep the diff tight.)

Full tests/test_workflows.py green (292 passed); ruff clean.

doquanghuy requested a review from mnriem as a code owner June 12, 2026 17:37

mnriem requested a review from Copilot June 16, 2026 13:45

Copilot started reviewing on behalf of mnriem June 16, 2026 13:45 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Comment thread src/specify_cli/__init__.py Outdated

Comment thread tests/test_workflows.py Outdated

mnriem requested a review from Copilot June 17, 2026 12:25

Copilot started reviewing on behalf of mnriem June 17, 2026 12:25 View session